A brief description of your dataset including its provenance, dimensions, etc. as well as the reason why you chose this dataset.
The Himalayan Database was created to continue the legacy of Elizabeth Hawley, a journalist who spent decades documenting Himalayan expeditions. This procured version is provided on TidyTuesday GitHub repository and has been amended to the years 2020 - 2024 to manage file size.
There are two main datasets included: exped_tidy, which contains details of expeditions. And peaks_tidy, which includes data about the Himalayan Peaks.
Outcome variable (success/failure) Also includes injury
SEASON_FACTOR
Season of the expedition
Obviously winter is more hazardous
O2USED
Whether supplemental oxygen was used
None
MEMBERS
Number of expedition team members
Team size
HIRDMEMB
Number of hired staff
None
COMRTE
Indicates if expedition was commercially organized
Commercial/non-commercial
YEAR
Year of expedition 2020–2024
Dates 2020–2024
DEATHS
Number of deaths during expedition
Safety outcome
FUNDING_TYPE
Commercial, Private, etc.
Derived variable to filter for funding type Commercial vs Private
Variables from peaks_tidy
Variable
Description
Your Notes
HEIGHTM
Elevation of the peak in meters
High elevation might drive both risk and cost
HIMAL_FACTOR
Name of the mountain
None
REGION_FACTOR
Broader geographic region grouping peaks
Geographic variability in danger and cost
PSTATUS_FACTOR
Peak status climbed or unclimbed
Less frequently climbed peaks possibly more dangerous
We chose this dataset because it provides multiple opportunities to explore the intersection of geography, climate, and human decision-making under extreme conditions. It supports both modeling and visualization of expedition risk, success, and outcomes.
Questions
The two questions you want to answer.
Question 1. What factors contribute to the success or failure of a summit?
Question 2. Does an expeditions funding affect safety outcomes?
Analysis plan
A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).
Question 1: What factors contribute to the success or failure of a summit?
We will analyze expedition outcomes using both expedition and peak variables. “Success” will be defined based on the TERMREASON_FACTOR, with specific values representing a successful summit (e.g., “Success (main peak)”, “Success (claimed)”). The aim is to model how geography, team size, oxygen use, and other factors relate to the likelihood of success.
We will filter by success and explore facets of the data. Below is an example of how we might focus our attention on correlated variables.
These are some of the variables that we think might be promising. Note that the correlation matrix above is only numeric and does not include other data types. Neither the list or the correlation plot is all inclusive.
From exped_tidy: TERMREASON_FACTOR, SEASON_FACTOR, O2USED, MEMBERS, HIRDMEMB, COMRTE, YEAR
From peaks_tidy: HEIGHTM, HIMAL_FACTOR, REGION_FACTOR, PSTATUS_FACTOR
Variables to be created:
SUMMIT_SUCCESSFUNDING_TYPE SUMMIT_SUCCESS will be composite variable, possibly grouped by country/team etc.
This will filter our data for further analysis.
Question 2. Does an expeditions funding affect safety outcomes?
The analysis will be very similar for this question. I suspect there will be some overlap, funding and success rate etc. We will use many of the same variables, but more specifically, FUNDING_TYPE.
NOTE: Add “real world” examples of expedition success or failure to make analysis relatable.
Source Code
---title: "A Statistical Exploration of Himalayan Expeditions"subtitle: "INFO 526 Final Project Proposal"author: - name: "Trevor Macdonald & Nandakumar Kuthalaraja" affiliations: - name: "School of Information, University of Arizona"description: > Analysis of Himalayan expeditions from 2020–2024, exploring the factors that influence success, failure, and safety outcomes.format: html: toc: true toc-depth: 2 code-tools: true code-overflow: wrap code-line-numbers: true embed-resources: true code-fold: trueeditor: visualcode-annotations: hoverexecute: warning: false---```{r}#| label: load-pkgs#| message: false#load and install essential librariesif (!require("pacman")) install.packages("pacman")pacman::p_load( tidyverse, # Core tidyverse packages here, # File path management ggrepel, # Smart text labels ggthemes, # Extra themes for ggplot2 scales, # Better axis formatting waffle, # Waffle charts glue, # String interpolation openintro, # Datasets patchwork, # Combine ggplots tidytuesdayR, #Tidyaccess GGally)# Set default theme for ggplot2ggplot2::theme_set(ggplot2::theme_minimal(base_size =14))# Global output widthoptions(width =65)# Global knitr optionsknitr::opts_chunk$set(fig.width =7, # 7" widthfig.asp =0.618, # Golden ratio aestheticfig.retina =3, # dpi multiplier for displaying HTML output on retinafig.align ="center", # Center align figuresdpi =300# Higher dpi, sharper image)```## Dataset```{r}#| label: load-dataset#| message: false# Option 1: tidytuesdayR package ## install.packages("tidytuesdayR")tuesdata <- tidytuesdayR::tt_load(2025, week =3)exped_tidy <- tuesdata$exped_tidypeaks_tidy <- tuesdata$peaks_tidy# Save as CSVswrite_csv(exped_tidy, here::here("data", "exped_tidy.csv"))write_csv(peaks_tidy, here::here("data", "peaks_tidy.csv"))```1. A brief description of your dataset including its provenance, dimensions, etc. as well as the reason why you chose this dataset.The Himalayan Database was created to continue the legacy of Elizabeth Hawley, a journalist who spent decades documenting Himalayan expeditions. This procured version is provided on [**TidyTuesday GitHub repository**](https://github.com/rfordatascience/tidytuesday/tree/main/data/2025/2025-01-21) and has been amended to the years 2020 - 2024 to manage file size.There are two main datasets included: `exped_tidy`, which contains details of expeditions. And `peaks_tidy`, which includes data about the Himalayan Peaks.```{r}# Load TidyTuesday data inlinetuesdata <- tidytuesdayR::tt_load(2025, week =3)exped_tidy <- tuesdata$exped_tidypeaks_tidy <- tuesdata$peaks_tidy``````{r}# Dimensions of each tibbledim(exped_tidy)dim(peaks_tidy)```### Variables from `exped_tidy`| Variable | Description | Your Notes ||------------------------|------------------------|------------------------|| `TERMREASON_FACTOR` | Reason the expedition ended | Outcome variable (success/failure) Also includes injury || `SEASON_FACTOR` | Season of the expedition | Obviously winter is more hazardous || `O2USED` | Whether supplemental oxygen was used | None || `MEMBERS` | Number of expedition team members | Team size || `HIRDMEMB` | Number of hired staff | None || `COMRTE` | Indicates if expedition was commercially organized | Commercial/non-commercial || `YEAR` | Year of expedition 2020–2024 | Dates 2020–2024 || `DEATHS` | Number of deaths during expedition | Safety outcome || `FUNDING_TYPE` | Commercial, Private, etc. | Derived variable to filter for funding type Commercial vs Private |### Variables from `peaks_tidy`| Variable | Description | Your Notes ||------------------------|------------------------|------------------------|| `HEIGHTM` | Elevation of the peak in meters | High elevation might drive both risk and cost || `HIMAL_FACTOR` | Name of the mountain | None || `REGION_FACTOR` | Broader geographic region grouping peaks | Geographic variability in danger and cost || `PSTATUS_FACTOR` | Peak status climbed or unclimbed | Less frequently climbed peaks possibly more dangerous |We chose this dataset because it provides multiple opportunities to explore the intersection of geography, climate, and human decision-making under extreme conditions. It supports both modeling and visualization of expedition risk, success, and outcomes.## QuestionsThe two questions you want to answer.Question 1. What factors contribute to the success or failure of a summit?Question 2. Does an expeditions funding affect safety outcomes?## Analysis plan- A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).**Question 1: What factors contribute to the success or failure of a summit?**We will analyze expedition outcomes using both expedition and peak variables. "Success" will be defined based on the `TERMREASON_FACTOR`, with specific values representing a successful summit (e.g., "Success (main peak)", "Success (claimed)"). The aim is to model how geography, team size, oxygen use, and other factors relate to the likelihood of success.We will filter by success and explore facets of the data. Below is an example of how we might focus our attention on correlated variables.```{r}# Select only numeric variablesexped_numeric <- exped_tidy |>select(where(is.numeric))ggcorr(exped_numeric,label =TRUE,label_alpha =TRUE,label_size =3,hjust =1,layout.exp =1) +theme(axis.text.x =element_text(angle =45, hjust =1, size =8),axis.text.y =element_text(size =8),plot.title =element_text(face ="bold", size =16),plot.subtitle =element_text(size =12),plot.caption =element_text(size =10) ) +labs(title ="Correlation Matrix of Expedition Variables",subtitle ="numeric variables only*",caption ="Source: Himalayan Database (2020–2024)" )```#### Variables involved:These are some of the variables that we think might be promising. Note that the correlation matrix above is only numeric and does not include other data types. Neither the list or the correlation plot is all inclusive.- From `exped_tidy`:\`TERMREASON_FACTOR`, `SEASON_FACTOR`, `O2USED`, `MEMBERS`, `HIRDMEMB`, `COMRTE`, `YEAR`- From `peaks_tidy`:\`HEIGHTM`, `HIMAL_FACTOR`, `REGION_FACTOR`, `PSTATUS_FACTOR`#### Variables to be created:- `SUMMIT_SUCCESS``FUNDING_TYPE` SUMMIT_SUCCESS will be composite variable, possibly grouped by country/team etc.This will filter our data for further analysis.**Question 2. Does an expeditions funding affect safety outcomes?**The analysis will be very similar for this question. I suspect there will be some overlap, funding and success rate etc. We will use many of the same variables, but more specifically, `FUNDING_TYPE`.NOTE: Add "real world" examples of expedition success or failure to make analysis relatable.